diarization performance
Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions
Xu, Anfeng, Huang, Kevin, Feng, Tiantian, Shen, Lue, Tager-Flusberg, Helen, Narayanan, Shrikanth
Speech foundation models, trained on vast datasets, have opened unique opportunities in addressing challenging low-resource speech understanding, such as child speech. In this work, we explore the capabilities of speech foundation models on child-adult speaker diarization. We show that exemplary foundation models can achieve 39.5% and 62.3% relative reductions in Diarization Error Rate and Speaker Confusion Rate, respectively, compared to previous speaker diarization methods. In addition, we benchmark and evaluate the speaker diarization results of the speech foundation models with varying the input audio window size, speaker demographics, and training data ratio. Our results highlight promising pathways for understanding and adopting speech foundation models to facilitate child speech understanding.
- North America > United States > California (0.14)
- Europe > Sweden > Stockholm > Stockholm (0.04)
Designing an Effective Metric Learning Pipeline for Speaker Diarization
Narayanaswamy, Vivek Sivaraman, Thiagarajan, Jayaraman J., Song, Huan, Spanias, Andreas
ABSTRACT State-of-the-art speaker diarization systems utilize knowledge from external data, in the form of a pre-trained distance metric, to effectively determine relative speaker identities to unseen data. However, much of recent focus has been on choosing the appropriate feature extractor, ranging from pre-trained i vectors to representations learned via different sequence modeling architectures (e.g. In this paper, we argue that, regardless of the feature extractor, it is crucial to carefully design a metric learning pipeline, namely the loss function, the sampling strategy and the discrimnative margin parameter, for building robust diarization systems. Furthermore, we propose to adopt a fine-grained validation process to obtain a comprehensive evaluation of the generalization power of metric learning pipelines. Using empirical studies, we provide interesting insights into the effectiveness of different design choices and make recommendations.